A very early step in any data processing is to understand how many rows are in a data frame, as this often represents the number of participants or total number of trials. This is useful to check at multiple steps of your data processing to make sure you have not done something wrong.
Why are there different number of rows in the three data frames when this data all comes from the same participants?
Why are the numbers not round?
5.1.2 Viewing column names
How would you know what variables are in a data frame? You can view the data frame, but it can also be useful to print them. Knowing what you have is one of the first steps to working with it.
Code
# print all column namescolnames(data_demographics_raw)
# A tibble: 6 × 10
date time subject blockcode Blocknum and trialnu…¹ trialcode primestim
<chr> <time> <dbl> <chr> <chr> <chr> <dbl>
1 23.06.22 10:46:38 5.49e8 practice 1_4 prime_ne… 0
2 23.06.22 10:46:38 5.49e8 practice 1_5 prime_ne… 0
3 23.06.22 10:46:38 5.49e8 practice 1_6 prime_po… 0
4 23.06.22 10:46:38 5.49e8 test 2_1 instruct… 0
5 23.06.22 11:55:36 5.05e8 practice 1_4 prime_ne… 0
6 23.06.22 11:55:36 5.05e8 practice 1_5 prime_po… 0
# ℹ abbreviated name: ¹`Blocknum and trialnum`
# ℹ 3 more variables: targetstim <dbl>, correct <dbl>, latency <dbl>
5.2 The pipe (%>% or |>)
%>% is the original pipe created for the {magrittr} package and used throughout the tidyverse packages. It is slightly slower but also more flexible.
|> is a version of the pipe more recently added to base-R. It is slightly faster but less flexible.
If you’re not sure, it’s easier to use %>%.
5.2.1 What is the pipe?
The output of what is left of the pipe is used as the input to the right of the pipe, usually as the first argument or the data argument.
Code
library(janitor)# use a function without the pipeexample_without_pipe <- janitor::clean_names(data_demographics_raw)# use a function with the pipe. example_with_pipe <- data_demographics_raw %>% janitor::clean_names()# check they produce identical resultsidentical(example_without_pipe, example_with_pipe)
[1] TRUE
5.2.2 Why use the pipe?
The pipe allows us to write code that reads from top to bottom, following a series of steps, in the way that humans organize and describe steps. Without the pipe, code is written from the inside out, in the way that the computer understands it but humans do not as easily.
The utility of this becomes more obvious when there are many steps:
Code
# use a series of functions without the pipeexample2_without_pipe <-summarise(group_by(mutate(rename(clean_names(dat = data_amp_raw), unique_id = subject, block = blockcode, trial_type = trialcode, rt = latency), fast_trial =ifelse(rt <100, 1, 0)), unique_id), percent_fast_trials =mean(fast_trial)*100) # use a series of functions with the pipeexample2_with_pipe <- data_amp_raw %>%# clean the column namesclean_names() %>%# rename the columnsrename(unique_id = subject,block = blockcode,trial_type = trialcode,rt = latency) %>%# create a new variable using existing onesmutate(fast_trial =ifelse(rt <100, 1, 0)) %>%# summarize across trials for each participantgroup_by(unique_id) %>%summarise(percent_fast_trials =mean(fast_trial)*100) # check they produce identical resultsidentical(example2_without_pipe, example2_with_pipe)
[1] TRUE
5.3 Using the pipe & cleaning column names
It is almost always useful to start by converting all column names to ones that play nice with R/tidyverse and which use the same naming convention (e.g., snake_case, which is standard in tidyverse).
How would you bring up the help menu to understand how janitor::clean_names() works?
---title: "The pipe"format: html: toc: true toc_float: true code-fold: show code-tools: true self-contained: true---```{r}#| include: false# settings, placed in a chunk that will not show in the .html file (because include=FALSE) # disables scientific notation so that small numbers appear as eg "0.00001" rather than "1e-05"options(scipen =999) ```## Exploring data### Count number of rowsA very early step in any data processing is to understand how many rows are in a data frame, as this often represents the number of participants or total number of trials. This is useful to check at multiple steps of your data processing to make sure you have not done something wrong.```{r}library(readr)library(dplyr)# demographics datadata_demographics_raw <-read_csv(file ="../data/raw/data_demographics_raw.csv") # self report measure datadata_selfreport_raw <-read_csv(file ="../data/raw/data_selfreport_raw.csv") # affect attribution procedure datadata_amp_raw <-read_csv(file ="../data/raw/data_amp_raw.csv")nrow(data_demographics_raw)nrow(data_selfreport_raw)nrow(data_amp_raw)```- Why are there different number of rows in the three data frames when this data all comes from the same participants?- Why are the numbers not round?### Viewing column namesHow would you know what variables are in a data frame? You can view the data frame, but it can also be useful to print them. Knowing what you have is one of the first steps to working with it.```{r}# print all column namescolnames(data_demographics_raw)# print all column names as a vectordput(colnames(data_demographics_raw))data_demographics_raw %>%colnames() %>%dput()data_selfreport_raw %>%colnames() %>%dput()data_amp_raw %>%colnames() %>%dput()```### Viewing column names and types```{r}head(data_demographics_raw) head(data_selfreport_raw)head(data_amp_raw)```## The pipe (`%>%` or `|>`)`%>%` is the original pipe created for the {magrittr} package and used throughout the tidyverse packages. It is slightly slower but also more flexible.`|>` is a version of the pipe more recently added to base-R. It is slightly faster but less flexible.If you're not sure, it's easier to use `%>%`.### What is the pipe?The output of the function to the left of the pipe is used as the input to the function to the right of the pipe.``` text[this function's output...] %>% [...becomes this function's input]``````{r}library(janitor) # for clean_names()library(dplyr) # for %>%# without the pipeexample_without_pipe <- janitor::clean_names(data_demographics_raw)# with the pipe example_with_pipe <- data_demographics_raw %>% janitor::clean_names()```### Why use the pipe?The pipe allows us to write code that reads from top to bottom, following a series of steps, in the same way that humans would describe and conduct the steps. Without the pipe, code is written from the inside out in the way that R understands it but humans do not as easily.The utility of the pipe becomes more obvious when there are many steps in the workflow.The following example uses functions we have not learned yet. We'll cover them in later chapters. For the moment, the point is to demonstrate the usage of the pipe.This is it without the pipe:```{r}example2_without_pipe <-summarise(group_by(mutate(rename(clean_names(dat = data_amp_raw ), unique_id = subject, block = blockcode, trial_type = trialcode, rt = latency), fast_trial =ifelse(rt <100, 1, 0)), unique_id), percent_fast_trials =mean(fast_trial)*100 ) ```This is it with the pipe:```{r}example2_with_pipe <- data_amp_raw %>%# clean the column namesclean_names() %>%# rename the columnsrename(unique_id = subject,block = blockcode,trial_type = trialcode,rt = latency) %>%# create a new variable using existing onesmutate(fast_trial =ifelse(rt <100, 1, 0)) %>%# summarize across trials for each participantgroup_by(unique_id) %>%summarise(percent_fast_trials =mean(fast_trial)*100) ```You can verify that the pipe produces identical results, but makes for much more readable code:```{r}# check they produce identical resultsidentical(example2_without_pipe, example2_with_pipe)```## Which argument is the input set to?Which argument does the pipe feed the input to? The {dplyr} pipe, originally called the {magrittr} pipe (`%>%`) is relatively intelligent, and tries to supply the input as the most suitable argument. If the function takes the `.data` argument, it will be used as that. Failing that, it is used as the first argument.Because {tidyverse} packages, and many others, are written with the pipe in mind, many work natively with the pipe. If a function doesn't work natively with the pipe - that is, if the function doesn't accept the pipe's input as the correct argument and throws an error - you can specify it manually as "`.`":```{r}data_demographics_raw %>% janitor::clean_names(dat = .)```The base R pipe (`|>`) is less intelligent behind the scenes. It always supplies the input as the first argument. This can be overridden with `_`, although this works imperfectly and not all functions will accept it. ```{r}data_demographics_raw |> janitor::clean_names(dat = _)```## Using the pipe & cleaning column namesIt is almost always useful to start by converting all column names to ones that play nice with R/tidyverse and which use the same naming convention (e.g., snake_case, which is standard in tidyverse).How would you bring up the help menu to understand how `janitor::clean_names()` works?Rewrite each of the below to use the pipe.```{r}data_demographics_clean_names <- data_demographics_raw %>%clean_names() data_selfreport_clean_names <- data_selfreport_raw %>%clean_names() data_amp_clean_names <- data_amp_raw %>%clean_names() ```